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ABSTRACT 


Decades-old research has demonstrated the effects of virtual space on perception mostly with adult samples. Little is 
known about children’s ability to utilize spatial-temporal qualities from computerized settings. Past research with primary 
school children suggested that the ability to utilize spatial-temporal information is crucial for inferring cause-effect 
relationships of natural phenomena. However, children’s performance lagged behind when spatial-temporal qualities 
were presented on a computer screen. To investigate this matter further, 16 adults, 17 nursery, and 19 reception age 
children were tested individually (N=52) across three tasks —virtual, virtual with less intense, and actual spatial-temporal 
tasks-. The results showed that: (1) young children performed poorly on virtual tasks. (2) Children’s ability to process 
spatial-temporal information varied largely depending on the characteristics of the task. (3) Spatial-temporal analysis in a 
virtual space required extra support from widely distributed domains operating attention and memory. (4) The intensity of 
the information presentation at virtual displays influenced young children’s performances, but not adults’. The results 
may explain why some children cannot perform well / benefit from teaching/learning activities via 2/3-dimensional 
settings: the ability to utilize the amount of spatial-temporal information varies widely across development, in particular 
when children cannot manipulate the intensity of the information they are exposed to. Missing the third dimension 
(e.g. depth) in virtual tasks is challenging for both young and older children in which the majority of them seem to fail to 
compensate. Evolutionarily our coping systems seem to be more advanced for extracting spatial-temporal information 
from real environments as opposed to virtual. This may challenge in particular the research measuring young children’s 
performances from computerized displays. 
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1. INTRODUCTION 


The terms space and time are quite ambiguous, as their qualities are susceptible to various interpretations. 
Neither time nor space has appearances. A mathematical space, for instance, qualifies us to represent a set of 
observations and quantitative measures (Gray, 1989); a topological space axiomatizes the shape of sets of 
points in terms of neighborhoods, enabling concepts such as wires, holes, convergence and continuity 
(Bourbaki, 1940). In Euclidian geometry, space is axiomatised in terms of rigid lines with fixed lengths, 
enabling concepts such as fixed trajectories in spacetime and geometric shapes like triangles and squares 
(see Euclid’s elements). A virtual space is generated by machines, where spatial-temporal qualities are 
presented with the aid of a set of quantized cubic grid points to simulate a virtual form of reality 
(e.g. Hammoud, et al., 2011). A mental space corresponds to imagery (Kosslyn, 1980, 1995), operating with 
the aid of working memory, a crucial property of communication and mental representations (Faunconnier, 
1994). 

Any kind of experience, in real or virtual space, is constituted through cognitive operations in which 
observable properties are registered by senses. It is our exquisite visual sensitivity for instance that enables us 
to distinguish an apple from a pear in various environments. Besides, looking at a static picture of an apple 
requires different cognitive resources than watching an apple rolling/falling dynamically, or imagining it in a 
different context. The pictorial forms adopt spatial properties, which are also perceptual, but in a piecemeal 
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and discrete protocol, namely e.g. the visual system is being kept by the stimuli of an apple as spatially 
constant. The dynamic form instead involves temporal dimension, providing feedback for the past, current, 
future situations, by enabling spatial-temporal representations to be encoded in an online fashion. In both 
cases (pictorial or online) what makes up an entity/event perceptual is that its (i) qualitative (e.g. color, 
texture, smell, topological shape), (ii) quantitative (e.g. length, width, depth, geometrical shape), and 
(iii) temporal properties (e.g. speed, succession, directionality), all are embedded in three/four dimensional 
spacetime. I call these three components perceptual primitives. 

In the example of apple rolling/falling, in which the temporal onsets and offsets are observable, while 
motionless, the qualitative and quantitative features of the apple can be afforded. Once it starts rolling/falling, 
various features move together, requiring its perceptual primitives to be bound across space and time, 
enabling online registrations (see e.g. Robertson, 2003). Although how binding occurs at neural level is a 
mystery, its role for spatial-temporal qualities in structuring the external world is fundamental 
(see e.g. Trehub, 2007), as our sense of physical reality is largely dependent on internal resources, rather than 
being developed only from sensory information. At perceptual level, for instance, identification and updating 
time related changes of rapid successions in the visual system stands as the main activity (see also Raymond 
et al., 1992). Perception, however, is not informationally encapsulated from cognition as Fodor (1983) 
argued. There are no known barriers preventing us to move from perception to inference, or vice versa. 
Instead, perception is often described as having different stages, hierarchically organized (e.g. bottom-up 
processing requiring the capacity of top-down processing, such as attention) (Marr, 1982). At the 
analytical/cognitive level, the primitives are representational; these are mentally abstracted, integrated or 
interpreted by higher level intellectual faculties, such as conceptual or schematic reasoning, imagination, or 
planning 
(see Fugelsang and Mareschal, 2013; Tolmie and Diindar-Coecke, in press, for the distinction between early 
perceptual and inferential processes). Evidently, spatial-temporal analysis has multiple forms, and employs 
various distinct cognitive resources utilized uniquely in perceptual and analytic processes (Diindar-Coecke et 
al., 2019b). This is probably why deep thinking is easier when we stop watching. 

Apparently, required abilities for, in particular, perceptual processes emerge very early on. Based on 
infants’ looking time at contiguous and noncontiguous events, Leslie and Keeble (1987) found that even six 
months old infants are susceptible to spatial-temporal characteristics of events and object interactions (se also 
Oakes & Cohen, 1990). These abilities become more recognizable during toddlerhood in the sense that 
preschoolers take into account spatial-temporal contiguity in their inference (see e.g. Schlottmann, 1999). 
Moreover, children and adults seem to use the same principles (e.g. priority, contiguity, succession) to infer 
cause-effect relations (see Bullock et al., 1982 for children, and Michotte, 1946/1963 for adults). Thus, we 
need to bear in mind that the majority of research investigating spatial-temporal analysis has been conducted 
in real world environments and employed actual materials (e.g. machines, toys, pictures, real objects). We 
know very little about whether perception of spatial-temporal qualities relies on the same principles in virtual 
and physical environments, or more specifically what happens when perceptual primitives are presented on a 
computer screen. Knowing how perceptual experiences are organized in the real world may not be sufficient 
to understand virtual reality and effects on perceptual/inferential experiences. If we could identify how 
perceptual primitives in these processes work and what the boundaries are, we could use this knowledge for 
teaching and learning practices, and we would even design a language that allows communicating with 
virtual perceptual experiences (see Carr & England’s 1995 proposal of ‘perceptual language’). 

According to Carr and England (1995) virtual space is the first level of virtualization, followed by the 
virtual image, and virtual environments. Regarding the first, the user perceives a three-dimensional, but flat 
layout of objects in space, which is akin to picture viewing. Virtual environment is the most inclusive and 
provides the largest variety of information sources to the senses. However, even in this inclusive setting, the 
sense of reality is constructed from the symbolic, geometric, and dynamic information presentations 
indirectly. In this environment, although many aspects of the actual forms can be imitated identically, a user 
actually interacts with the objects via an interface (Saha et al., 1994). 

A few studies have attempted to compare human perception in the real versus virtual environments. 
Lampton et al. (1995), for instance, found that humans are less accurate in estimating distance in virtual 
environments than in the real world. Witmer and Kline (1998) investigated the factors influencing perceived 
distance estimates. They conducted two experiments to assess the contribution of various distance cues, 
including visual, cognitive, and proprioceptive cues. In the first experiment, participants were static and they 
needed to estimate the distance to a cylinder placed at various points. The effects of floor texture/patterns, 


372 


16th International Conference on Cognition and Exploratory Learning in Digital Age (CELDA 2019) 


and object size were also considered. In the second experiment, participants were asked to move and they 
needed to estimate the route segmented and total route distances. The effect of movement was considered. 
The authors found that traversing a distance improved distance estimate, and the errors were smaller in the 
physical world. In the virtual environment when the participants moved faster, their estimates were less 
accurate. Another study compared the fidelity of virtual versus real environment. Adult participants explored 
a virtual room by using a head mounted display. They were asked to estimate the dimensions of the room in 
meters, and afterwards they answered a questionnaire aiming to explore their perception of the physical 
properties of the room. The authors found that people perceived the virtual properties quite accurately 
compared to its actual size, except for the height dimension. However, in this study, participants’ movement 
was restricted, as previously it was found to affect perception (e.g. walking speed, size of the environment). 
Cheng et al. (2014) also concluded that human performance in a virtual environment was less accurate with a 
greater error rate. Saleeb (2015) found that virtual dimensions are perceived as smaller than their physical 
counterparts, indicating discrepancies in human appreciation of metrics regarding the virtual and physical 
dimensions. 

Virtual space obviously scales metrics differently, in which a user needs to integrate in perceptually 
different widths, depths, heights, and speeds. There is some evidence that extraction of spatial-temporal 
dimensions continuously is difficult even in actual environments. Testing 107 5-to-11-year-olds individually, 
the previous study showed that children did well when the tasks segmented object motion in the real 
environment. In this segmented version, changes in spatial-temporal information were presented 
proportionally following a logical consecutive order (e.g. liquid flowed from an upper flask to a lower flask 
in five consecutive stages, and the liquid level changed in both flasks as a function of liquid flow by time). In 
another task, spatial-temporal information was presented on a computer screen (children were asked to 
compare the speed of the three bunnies and judge the duration with the distance taken). The majority of 
children, in particular the young ones, failed to extrapolate spatial-temporal qualities from this computerized 
displays. Response profiles for each age group, and one-way ANOVAs showed age-related increases, but 
growth occurred later, with no difference between Year | and Year 3 children, though significantly differed 
from Year 5. Accurate responses in this task depended on processing the successive spatial states for each 
object, with perceptual resources needed to be attained for visually enriched loadings over time 
(Diindar-Coecke et al., 2019a). However, this study did not say anything about the environmental effect 
(virtual vs. actual) on spatial-temporal analysis. 

It is common for even toddlers to start watching cartoons on televisions/tablets, dealing with the intense 
information presented. Furthermore, recent trend is for psychologists/educationalists to use computerized 
tasks to measure children’s competences. Nevertheless, decades-old research has ignored the impact of 
virtual versus physical space on the ability to extract spatial-temporal information; in particular we know 
almost nothing about its development. The present study compared young children’s response profiles with 
adults’. The aim was to understand whether children’s failure was related to (a) differences in processing 
spatial-temporal characteristics in virtual and actual space, or (b) whether it is more to do with the fact that 
the intensity of the information load differed in virtual and actual spaces, or (c) whether extrapolating future 
states was simply difficult for children. If option ‘a’ is more likely, this will highlight the importance of the 
primitives and the characteristics of virtual and physical spaces in inducing perceptual processes. If “b’ and 
‘c’? are more likely, this will call for a cognitive approach, suggesting that age related changes in 
spatial-temporal cognition are highly linked to changes in e.g. attentional processes, and therefore 
performance in the tasks are the consequences of individual’s cognitive competences. If option ‘c’ is more 
likely, regarding the temporal horizon, children must have found it difficult to locate future relations, as 
assembled in memories. The past may be chronologically easier to comprehend -even in deterministic 
systems- because the future is less supported by past experience (see e.g. McCormack, 2015 for a review). 
The present study therefore aimed to provide further evidence on these by employing younger children’s 
performance on the three tasks to obtain comparable results against previously investigated primary age 
trends. The sample also includes adults for the sake of extending the comparability and elaborating whether 
immature and mature forms of spatial-temporal analysis exist. 
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2. METHOD AND RESULTS 


2.1 Method 


The study utilized an experimental design employing three age groups, two spanning the English nursery and 
reception class age range (4 and 5-year-olds), and the third involving adults. Three tasks were given to all 
participants in random order within a single one-to-one session, followed by a short conversation aiming to 
obtain their further thoughts on the tasks. 


2.1.1 Participants 


Children were recruited from two schools, a nursery, and a primary, located in Oxford, UK. Adult 
participants who volunteered to take part were also recruited from the same city. Covering a range of 
socioeconomic background, the sample included 17 nursery children (N), mean age = 46.4; sd=3.04 months; 
19 reception age children (R), mean age = 64.74, sd = 3.9 months; and 16 adults (A), mean age = 36.6, 
sd = 11.3 years. The sample consisted of typically developing children and adults with no known cognitive 
disability. 


2.1.2 Materials and Procedure 


Test sessions took place in a working room, or with some children out of class in a quiet area within their 
schools. Each participant took an average of approximately 8 minutes to test (min = 5, max = 13). All 
responses were recorded manually on score sheets at the time, and participants’ explanations noted for later 
checking. For the experiments, the extraneous variables were kept constant so as not to influence 
participants’ responses, such as the laptop, procedure, the researcher conducting the experiments, 
explanations of the tasks, materials, brightness of the screen, participants’ distance from the computer etc. 
The clockwork toys generated a reasonable amount of mechanical noise, but participants were requested to 
ignore that detail and focus on their movement. 


2.1.3 Measures 


Virtual speed task 1 (VS1). The initial version of this task was developed and used in Diindar-Coecke, et al. 
(2019b) with 17 trials. On a Macintosh laptop (resolution 1440 x 900 pixels) participants saw computer 
animations of three bunnies (red, yellow, black) racing towards a carrot from different start positions at 
different speeds (Figure la), with the animation stopping before they reached it. Children judged which 
bunny would arrive at the carrot first. The task began with two practice items, followed by 13 trials gradually 
increasing in difficulty: the stop time reduced, from 4 to 2 seconds, as did the difference between the three 
bunnies in start point and relative speed, making differences in arrival time harder to distinguish, and the 
period available within which to track the differences shorter. Each time children were asked “Which bunny 
would be the winner?” The number of correct responses was recorded (0-13). At the end, participants were 
asked to judge the difficulty level of the task, and make comments if they had any. 


a a 
a i I 
a at 


(a) VS1 (b) VS2 


Figure 1. Example configurations of bunnies at the start of 
(a) VS1, (b) VS2 tasks 
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Virtual speed task 2 (VS2). This task was exactly the same as the previous speed task but with one 
difference: rather than three bunnies, participants saw computer animations of two bunnies (red and yellow to 
ensure the visual discernibility) racing toward the target item from different start positions at different speeds 
(Figure 1b). The original speed task used in Diindar-Coecke et al.’s (2019a) study ranged in difficulty, 
positively skewed with a long tail in young children’s responses, indicating that for the majority of primary 
school children (5-to-11-year-olds) in particular for younger age groups, the task was difficult. Further 
analyses showed that the majority of children struggled with 8 trials consistently. These 8 trials were chosen 
to elaborate whether the intensity of the information load played role in this. These 8 trials were 
demonstrated with less intensity by cancelling one bunny located in the middle in all the trials to keep the 
distance as constant. Therefore the distance between the two bunnies widened. The number of correct 
responses was ranged between 0 and 8. Similarly, participants were asked to judge the difficulty level of the 
task, and make comments if they had any. 

Actual speed task (AS). This task was adopted from Piaget (1969/2006) who demonstrated that 
primitive/early understanding of space and time was highly dependent on duration-distance judgments. 
Participants compared three clockwork toys in each trial (e.g. a snail, a ladybird, and a doggy, Figure Ic), 
differing on speed, duration and distance towards a tunnel. The participants were shown half of the run, and 
the other half was hidden under a cardboard tunnel so that participants did not see the entire run, nor the 
winner. Durations of travels varied between either 6 or 8 seconds, but the visible parts were ranged from 3 to 
4 seconds with varying distances. The experimenter used a plastic ruler to make sure that participants 
observed a maximum of the first four seconds of the race. In total nine clockwork toys were used; two of 
them were replaced in each trial to avoid a conditioning effect. The task began with one practice item, in 
which a clockwork toy was introduced and an explanation was given as to how it moves toward the tunnel. 
The practice trial was followed by 5 trials gradually increasing in difficulty (e.g. the speed of the ladybird, 
crocodile, and the monkey were slightly different; the crocodile was the fastest, the ladybird in the middle, 
and the monkey slightly slower). These three toys were shown in single trials separately. The last two trials 
included two of them at the same time to increase the difficulty (e.g. ladybird, monkey, doggy). Once the 
winding keys were set up, the three objects were put behind the cardboard wall with varying distances/angles 
from the tunnel therefore the race was started for each object at the same time. Each time only one item 
travelled the longer distance with highest speed. This task aimed to replicate the results of the first virtual 
task. Participants needed to guess the winner from the distance to the end point and velocity of the object. 
The three objects traveled either (a) a different distance with the same time, or (c) the same distance with a 
different time. The number of correct responses was ranged between 0 and 5 over five trials. 


2.2 Results 


Analyses utilized data from all 52 participants who completed testing appropriately. The observed power for 
ANOVA was 0.90, for regression analyses it was 0.98. 

The means for each age group (Table 1) illustrated that young children did not perform well 
(proportionate to the maximum) on VS1, but instead they relatively performed well on VS2, and well on AS 
tasks. A one-way ANOVA showed significant age-related progression on VS1 and VS2 performance, 
however, for the VS1 differences between the groups was highly significant F(2,51)=35.525, p<.001, partial 
eta squared=.418, Welch and Brown-Forsythe robust tests were also highly significant (p<.001), indicating 
later growth for responses in this task. 


Table 1. Mean score (standard deviation) by age group on VS1 (max=13), VS2 (max=8), and AS (max=5) 


Tasks Nursery Reception Adults 
VS1 5.3 (3.2) 8.7 (2.8) 12.9 (0.4) 
VS2 6.3 (1.8) 7.4 (0.9) 8.0 (0.0) 
AS 4.2 (1.4) 4.8 (0.9) 5.0 (0.0) 


Zero-order correlations showed that there were no significant associations between the tasks and either 
gender or their socioeconomic status (SES). Only age highly correlated with VS1 (=r.767, p<.001) and VS2 
(r=.516, p<.001) respectively, confirming the ANOVA results; VS1 highly correlated with VS2 (r=.733, 
p<.001) and moderately with AS (.364, p<.05). However, VS2 did not associate with AS (r=.102. p>.05). 
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When controlling for age, the patterns remained similar with a slight reduction in correlations r(partial)=.613, 
p<.001; and .214, p<.01, suggesting that age was not the only factor, but the characteristics of the tasks did 
also matter. Participants’ feedback on this matter elicited the potential implications as discussed below. 


Table 2. Zero-order and partial correlations 


VSI VS2 AS 
Age 7678 S16*** 305*** 
VS1 1 J1338t* 364** 
VS2 .613*** 1 102 
AS 214 -.068 1 


Zero-order correlations above diagonal, N=52; partial correlations below 
diagonal; *p<.05, **p<.01, ***p<.001 


Further regression analyses showed that age explained the majority of variance (58.8%) in VS1 
performance (B=.767, p<.001); 26.6% variance in VS2 (B=.516, p<.001), and nearly 9% in AS performance 
(B=.305, p<.05), neither SES or gender was significant, confirming the high correlations of age effect for 
VS1 and slightly less for VS2, but non significant for AS, suggesting that age effect disappeared in the actual 
speed task. 


3. CONCLUSION 


A series of spatial-temporal tasks were employed with virtual versus actual materials, with one computerized 
task that involved relatively intensive loading, another that involved less, to elaborate whether children’s and 
adults’ ability to extract spatial-temporal information differed in real versus virtual space. Both virtual tasks 
missed the third dimension (depth); other spatial-temporal qualities were manipulated in each trial to see 
whether depth was compensated with distance and duration. The results highlighted two factors playing a 
role in processing perceptual properties in virtual and physical space: (1) age effect, and (2) task 
characteristics. 

Participants from all age groups did well (the youngest group relatively well) with the physical 
presentations, but responses on the virtual tasks did highly vary across development. In particular young 
children found comparing the speed of the three objects on a computerized display most difficult. Thirteen 
out of seventeen nursery children declared that when the objects were fast, the three-bunny version was 
harder to follow. Four reception children expressed similar feelings, and some of them talked about an 
interaction factor between distance and speed, making some trials in the second version of the virtual task 
harder to judge: perceived duration was distorted when the distance between the two fast objects widened 
then. 

The actual space task results indicated that some reception children were capable of distinguishing 
between distance and duration. This finding contradicts with Piaget’s (1969/2006; see also Piaget & Inhelder, 
1971) view who found that young children judged the duration with the distance taken, and argued that this 
was the result of their confusion between spatial and temporal dimensions, as they assumed longer distance 
would be equal to longer duration. His view was that young children do not reliably distinguish between 
more complex spatial-temporal characteristics such as velocity until about age nine. Contrary, other studies 
found that young children demonstrate early implicit knowledge of time, speed, and duration when the tasks 
involved more practical elements with which children have direct contact in their life (see e.g. Bullock et al., 
1982; Diindar-Coecke et al., 2019; Wilkening, 1981). According to Siegler and Richards (1979), 
distance-travelled cue in young children’s judgment is affected by spatial characteristics, what develops by 
age is that once children get older they rely less on spatial elements in their temporal judgments. Later Arlin 
(1989) isolated the spatial cue by lifting objects of different weights for fixed durations. He found that spatial 
cues did not affect duration judgments alone, unlike older ones young children’s judgments were affected by 
other types of manipulations. 
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In particular, the virtual speed tasks required the ability to deal with the type of spatial-temporal 
manipulations, indeed highly relying on cognitive resources such as attention, memory consistent with 
Arlin’s finding. This also resonates with the cognitive account, namely age related changes in 
spatial-temporal cognition linked to changes in memory and attentional processes (see also Driot-Volet, 
2011), supporting the argument that children’s perceptions were not rigidly encapsulated from higher-level 
inference. This was the case in particular for the virtual tasks. 

Comparing children’s and adults’ response profiles of the actual versus computerized speed, young 
children did not seem capable of dealing with asymmetrical manipulations of virtual space, and they seem to 
be weak in their approximations. Probably for that reason, most young children requested to repeat some of 
the trials. Note that the trials were not repeated. This finding can justify the option ‘a’ highlighting the 
importance of the characteristics of virtual and physical space affecting perceptual processes. 

Children’s struggle with the virtual tasks may be relevant to their inability to predict future positions of 
the objects (cf. Burns et al., 2018; Friedman, 2003; McCormack & Hanley, 2011). Though, children did not 
find it difficult to predict future relations with the actual materials. One can ask whether sequencing the past 
is inherently easier than predicting the future in virtual space. The present study cannot provide an answer for 
this question, but children’s performances on the virtual tasks can be analyzed as follows: virtual tasks 
required children to calibrate the horizontal distance and speed. In these tasks the perceptual cues were 
limited; in particular, the third dimension (depth) was not available. It is unknown whether the third 
dimension would support children’s estimations of the future positions, but given that the missing dimension 
is one of the quantitative aspects of the perceptual primitives, this highlights the importance of completeness 
of spatial-temporal information in fostering distinct cognitive capacities. 

Evidently, the characteristics of the spaces did matter in predicting future outcomes, and extracting 
spatial-temporal information from virtual setting seems to be much harder for children. It is probable that 
various reasons play a role in this: (1) processing virtual spatial-temporal qualities may computationally be 
intensive, (2) perceptual primitives seem to be considered as immaterial, fast, and hypothetical, 
(3) calibrating these seem to be more effortful than their real-world counterparts. The results resonate with 
Reiner’s (2018) findings underlying the involvement of other higher mental domains to support performance 
in virtual environments. Now we have a slight idea about why this occurs: in line with the literature showing 
differences in perception of virtual versus actual properties, and the results here showing that young children 
are more likely to struggle with extracting spatial-temporal information from virtual space, I argue that 
evolutionary our ability to extract spatial-temporal information from real environment may be more 
advanced. This has clear implications for educational implementations for the reason that children’s problem 
solving abilities can vary due to individual differences. This variance may increase in an unfavorable way 
when the role of the intensity of information presentation in virtual versus actual environments is ignored. 
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